Skip to content

Conversation

@a10y
Copy link
Contributor

@a10y a10y commented Jan 30, 2026

No description provided.

Signed-off-by: Andrew Duffy <[email protected]>
@a10y a10y force-pushed the aduffy/seq-cuda branch 4 times, most recently from 627f533 to f95d5ec Compare February 1, 2026 18:29
Comment on lines +10 to +25
__device__ void sequence(
ValueT *const output,
ValueT base,
ValueT multiplier,
uint64_t len
) {
const uint64_t worker = blockIdx.x * blockDim.x + threadIdx.x;

const uint64_t elemStart = MIN(worker * ELEMENTS_PER_THREAD, len);
const uint64_t elemEnd = MIN(elemStart + ELEMENTS_PER_THREAD, len);

for (uint64_t idx = elemStart; idx < elemEnd; idx++) {
output[idx] = static_cast<ValueT>(idx) * multiplier + base;
}
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please can we be consistent with the others doing 32 units of work per thread, but set with a constant so its very easy to change moving forward

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants